Focused Web Crawling Algorithms

نویسندگان

  • Andas Amrin
  • Chunlei Xia
  • Shuguang Dai
چکیده

Nowadays the web is rich of any kind of information. And this information is freely available thanks to the hypermedia information systems and the Internet. This information greatly influenced our lives, our lifestyle and way of thinking. A web search engine is a complex multi-level system that helps us to search the information that available on the Internet. A web crawler is one of the most important parts of the search engine. It’s a robot that systematically browses and indexes the World Wide Web. A focused web crawler is used crawling only web pages that are relevant to the user given topic or web page link. A focused crawler is a part of the search system that helps user to find most relevant information from the Internet. In our days, this area of computer science is very popular and important for the development of science and technology, because The Internet is constantly and rapidly growing, and the information in it. In this article we will review most effective focused web crawling algorithms that determines the fate of the search system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Focused Crawling Techniques

The need for more and more specific reply to a web search query has prompted researchers to work on focused web crawling techniques for web spiders. Variety of lexical and link based approaches of focused web crawling are introduced in the paper highlighting important aspects of each. General Terms Focused Web Crawling, Algorithms, Crawling Techniques.

متن کامل

On-line topical importance estimation: an effective focused crawling algorithm combining link and content analysis

Focused crawling is an important technique for topical resource discovery on the Web. The key issue in focused crawling is to prioritize uncrawled uniform resource locators (URLs) in the frontier to focus the crawling on relevant pages. Traditional focused crawlers mainly rely on content analysis. Link-based techniques are not effectively exploited despite their usefulness. In this paper, we pr...

متن کامل

Focused Crawling using Asynchronous Cellular Learning Automata

Web crawling is used to collect the web pages which will be indexed by a search engine. The search engine uses these crawled and indexed pages to answer users’ queries. Since the volume of web pages is very high and it increases continuously, search engines can index a limited number of web pages. Therefore, in recent years, the focused crawler algorithms have been introduced which act selectiv...

متن کامل

Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them

Comparable corpora have been used as an alternative for parallel corpora as resources for computational tasks that involve domainspecific natural language processing. One way to gather documents related to a specific topic of interest is to traverse a portion of the web graph in a targeted way, using focused crawling algorithms. In this paper, we compare several focused crawling algorithms usin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JCP

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2015